Search CORE

98 research outputs found

From Phonemes to Robot Commands with a Neural Parser

Author: Hinaut Xavier
Publication venue: HAL CCSD
Publication date: 18/09/2017
Field of study

International audienceThe understanding of how children acquire language [1][2], from phoneme to syntax, could be improved by computational models. In particular when they are integrated in robots [3]: e.g. by interacting with users [4] or grounding language cues [5]. Recently, speech recognition systems have greatly improved thanks to deep learning. However, for specific domain applications, like Human-Robot Interaction, using generic recognition tools such as Google API often provide words that are unknown by the robotic system when not just irrelevant [6]. Additionally, such recognition system does not provide much indications on how our brains acquire or process these phonemes, words or grammatical constructions (i.e. sentence templates). Moreover, to our knowledge they do not provide useful tools to learn from small corpora, from which a child may bootstrap from. Here, we propose a neuro-inspired approach that processes sentences word by word, or phoneme by phoneme, with no prior knowledge of the semantics of the words. Previously, we demonstrated this RNN-based model was able to generalize on grammatical constructions [7] even with unknown words (i.e. words out of the vocabulary of the training data) [8]. In this preliminary study, in order to try to overcome word misrecognition, we tested whether the same architecture is able to solve the same task directly by processing phonemes instead of grammatical constructions [9]. Applied on a small corpus, we see that the model has similar performance (even if a little weaker) when using phonemes as inputs instead of grammatical constructions. We speculate that this phoneme version could overcome the previous model when dealing with real noisy phoneme inputs, thus improving its performance in a real-time human-robot interaction

INRIA a CCSD electronic archive server

Which Input Abstraction is Better for a Robot Syntax Acquisition Model? Phonemes, Words or Grammatical Constructions?

Author: Hinaut Xavier
Publication venue: HAL CCSD
Publication date: 17/09/2018
Field of study

Corresponding code at https://github.com/neuronalX/Hinaut2018_icdl-epirobInternational audienceThere has been a considerable progress these last years in speech recognition systems [13]. The word recognition error rate went down with the arrival of deep learning methods. However, if one uses cloud-based speech API and integrates it inside a robotic architecture [33], one still encounters considerable cases of wrong sentences recognition. Thus speech recognition can not be considered as solved especially when an utterance is considered in isolation of its context. Particular solutions, that can be adapted to different Human-Robot Interaction applications and contexts, have to be found. In this perspective, the way children learn language and how our brains process utterances may help us improve how robot process language. Getting inspiration from language acquisition theories and how the brain processes sentences we previously developed a neuro-inspired model of sentence processing. In this study, we investigate how this model can process different levels of abstractions as input: sequences of phonemes, sequences of words or grammatical constructions. We see that even if the model was only tested on grammatical constructions before, it has better performances with words and phonemes inputs

Crossref

INRIA a CCSD electronic archive server

Reservoir SMILES: Towards SensoriMotor Interaction of Language and Embodiment of Symbols with Reservoir Architectures

Author: Hinaut Xavier
Publication venue: HAL CCSD
Publication date: 16/11/2022
Field of study

Language involves several hierarchical levels of abstraction. Most models focus on a particular level of abstraction making them unable to model bottom-up and top-down processes. Moreover, we do not know how the brain grounds symbols to perceptions and how these symbols emerge throughout development. Experimental evidence suggests that perception and action shape one-another (e.g. motor areas activated during speech perception) but the precise mechanisms involved in this action-perception shaping at various levels of abstraction are still largely unknown. My previous and current work include the modelling of language comprehension, language acquisition with a robotic perspective, sensorimotor models and extended models of Reservoir Computing to model working memory and hierarchical processing. I propose to create a new generation of neural-based computational models of language processing and production; to use biologically plausible learning mechanisms relying on recurrent neural networks; create novel sensorimotor mechanisms to account for action-perception shaping; build hierarchical models from sensorimotor to sentence level; embody such models in robots

INRIA a CCSD electronic archive server

La Course 12--4--90

Author: Hinaut Xavier
Publication venue: HAL CCSD
Publication date: 27/10/2021
Field of study

National audienceAu commencement était la ligne, théorique et infinie, comme le temps. Cette ligne peut être vue comme une frise temporelle ou spatio-temporelle, comme une bande permettant à une « tête de lecture » de calculer ce qui est calculable. Elle peut être tracée physiquement au fur et à mesure afin de remplir l'espace de motifs improvisés ou calculés. Les performances proposées ici trouvent leurs origines à la fois dans les sciences et les arts afin de tisser des liens, des lignes de contact, entre les perceptions scientifiques et artistiques, trop souvent perçues comme disjointes. Nous verrons notamment comment lier le travail de Alan Turing à celui de Keith Haring, deux icônes scientifiques et artistiques majeures du XX ème siècle, lors d'interactions entre un petit robot Ozobot et un humain qui dessinent des trajectoires et « code-instructions » pour ce robot

INRIA a CCSD electronic archive server

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Recurrent Neural Network for Syntax Learning with Flexible Representations

Author: Hinaut Xavier
Publication venue: HAL CCSD
Publication date: 19/12/2016
Field of study

International audienceWe present a Recurrent Neural Network (RNN), namely an Echo State Network (ESN), that performs sentence comprehension and can be used for Human-Robot Interaction (HRI). The RNN is trained to map sentence structures to meanings (e.g. predicates). We have previously shown that this ESN is able to generalize to unknown sentence structures in English and French. The meaning representations it can learn to produce are flexible: it enables one to use any kind of " series of slots " (or more generally a vector representation) and are not limited to predicates. Moreover, preliminary work has shown that the model could be trained fully incrementally. Thus, it enables the exploration of language acquisition in a developmental approach. Furthermore, an " inverse " version of the model has been also studied, which enables to produce sentence structure from meaning representations. Therefore, if these two models are combined in a same agent, one can investigate language (and in particular syntax) emergence through agent-based simulations. This model has been encapsulated in a ROS module which enables one to use it in a cognitive robotic architecture, or in a distributed agent simulation

INRIA a CCSD electronic archive server

From Phonemes to Sentence Comprehension: A Neurocomputational Model of Sentence Processing for Robots

Author: Hinaut Xavier
Publication venue: HAL CCSD
Publication date: 24/05/2018
Field of study

International audienceThere has been an important progress these last years in speech recognition systems. The word recognition error rate went down with the arrival of deep learning methods. However, if one uses cloud speech API and integrate it inside a robotic architecture, one faces a non negligible number of wrong sentence recognition. Thus speech recognition can not be considered as solved (because many sentences out of their contexts are ambiguous). We believe that contextual solutions (i.e. adaptable and trainable on different HRI applications) have to be found. In this perspective, the way children learn language and how our brains process utterances may help us improve how robots process language. Getting inspiration from language acquisition theories and how the brain processes sentences we previously developed a neuro-inspired model of sentence processing. In this study, we investigate how this model can process different levels of abstractions as input: sequence of phonemes, seq. of words or grammatical constructions. We see that even if the model was only tested on grammatical constructions before, it has better performances with words and phonemes inputs

INRIA a CCSD electronic archive server

A Robust Model of Gated Working Memory

Author: Hinaut Xavier
Rougier Nicolas,
Strock Anthony
Publication venue: 'MIT Press - Journals'
Publication date: 08/11/2019
Field of study

International audienceGated working memory is defined as the capacity of holding arbitrary information at any time in order to be used at a later time. Based on electrophysi-ological recordings, several computational models have tackled the problem using dedicated and explicit mechanisms. We propose instead to consider an implicit mechanism based on a random recurrent neural network. We introduce a robust yet simple reservoir model of gated working memory with instantaneous updates. The model is able to store an arbitrary real value at random time over an extended period of time. The dynamics of the model is a line attractor that learns to exploit reentry and a non-linearity during the training phase using only a few representative values. A deeper study of the model shows that there is actually a large range of hyper parameters for which the results hold (number of neurons, sparsity, global weight scaling, etc.) such that any large enough population, mixing excitatory and inhibitory neurons can quickly learn to realize such gated working memory. In a nutshell, with a minimal set of hypotheses, we show that we can have a robust model of working memory. This suggests this property could be an implicit property of any random population, that can be acquired through learning. Furthermore, considering working memory to be a physically open but functionally closed system, we give account on some counter-intuitive electrophysiological recordings

INRIA a CCSD electronic archive server

Learning to Parse Grounded Language using Reservoir Computing

Author: Hinaut Xavier
Spranger Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/08/2019
Field of study

International audienceRecently new models for language processing and learning using Reservoir Computing have been popular. However, these models are typically not grounded in sensorimotor systems and robots. In this paper, we develop a model of Reservoir Computing called Reservoir Parser (ResPars) for learning to parse Natural Language from grounded data coming from humanoid robots. Previous work showed that ResPars is able to do syntactic generalization over different sentences (surface structure) with the same meaning (deep structure). We argue that such ability is key to guide linguistic generalization in a grounded architecture. We show that ResPars is able to generalize on grounded compositional semantics by combining it with Incremental Recruitment Language (IRL). Additionally, we show that ResPars is able to learn to generalize on the same sentences, but not processed word by word, but as an unsegmented sequence of phonemes. This ability enables the architecture to not rely only on the words recognized by a speech recognizer, but to process the sub-word level directly. We additionally test the model's robustness to word error recognition

Crossref

INRIA a CCSD electronic archive server

Online Language Learning to Perform and Describe Actions for Human-Robot Interaction

Author: Dominey Peter F.
Hinaut Xavier
Petit Maxime
Publication venue: Research Institute for Cognition and Robotics (CoR-Lab)
Publication date: 01/09/2012
Field of study

International audienceThe goal of this research is to provide a real-time and adaptive spoken langue interface between humans and a humanoid robot. The system should be able to learn new grammatical constructions in real-time, and then use them immediately following or in a later interactive session. In order to achieve this we use a recurrent neural network of 500 neurons-echo state network with leaky neurons [1]. The model processes sentences as grammatical constructions, in which the semantic words (nouns and verbs) are extracted and stored in working memory, and the grammatical words (prepositions, auxiliary verbs, etc.) are inputs to the network. The trained network outputs code the role (predicate, agent, object/location) that each semantic word takes. In the final output, the stored semantic words are then mapped onto their respective roles. The model thus learns the mappings between the grammatical structure of sentences and their meanings. The humanoid robot is an iCub [2] who interacts around a instrumented tactile table (ReacTable TM) on which objects can be manipulated by both human and robot. A sensory system has been developed to extract spatial relations. A speech recognition and text to speech off-the-shelf tool allows spoken communication. In parallel, the robot has a small set of actions (put(object, location), grasp(object), point(object)). These spatial relations, and action definitions form the meanings that are to be linked to sentences in the learned grammatical constructions. The target behavior of the system is to learn two conditions. In action performing (AP), the system should learn to generate the proper robot command, given a spoken input sentence. In scene description (SD), the system should learn to describe scenes given the extracted spatial relation. Training corpus for the neural model can be generated by the interaction with the user teaching the robot by describing spatial relations or actions, creating pairs. It could also be edited by hand to avoid speech recognition errors. These interactions between the different components of the system are shown in the Figure 1. The neural model processes grammatical constructions where semantic words (e.g. put, grasp, toy, left, right) are replaced by a common marker. This is done with only a predefined set of grammatical words (after, and, before, it, on, the, then, to, you). Therefore the model is able to deal with sentences that have the same constructions than previously seen sentences. In the AP condition, we demonstrate that the model can learn and generalize to complex sentences including "Before you put the toy on the left point the drums."; the robot will first point the drums and then put the toy on the left: showing here that the network is able to establish the proper chronological order of actions. Likewise, in the SD condition, the system can be exposed to a new scene and produce a description such as "To the left of the drums and to the right of the toy is the trumpet." In future research we can exploit this learning system in the context of human language development. In addition, the neural model could enable errors recovery from speech to text recognition. Index Terms: human-robot interaction, echo state network, online learning, iCub, language learning. References [1] H. Jaeger, "The "echo state" approach to analysing and training recurrent neural networks", Tech. Rep. GMD model has been developed with Oger toolbox: http://reservoir-computing.org/organic/engine. Figure 1: Communication between the speech recognition tool (that also controls the robotic platform) and the neural model

BieColl - Bielefeld Electronic Collections

HAL-Inserm

BieColl - Bielefeld eCollections

HAL-Rennes 1